1. Field of the Invention
The present invention relates to systems, methods, programs, and storage media for automatic form recognition. Specifically, the present invention relates to a system, method, program, and storage medium for the recognition of forms, each containing multiple pages.
2. Description of the Related Art
Form recognition, in which forms are automatically classified according to previously registered formats, is a highly effective technique for performing entry processing for large quantities of forms. A form recognition system extracts feature values of form image data read by a scanner or the like, and creates form format data. The form recognition system then calculates the similarity between a search form and each registered form, and determines a registered form with the highest similarity to be the recognition result.
For form recognition, Japanese Patent Laid-Open No. 2000-285187 focuses attention on tables appearing on a form and determines the similarity based on the proportion of each table area to the total area of all the tables, the similarity being close to that determined based on visual appearance. However, since the similarity is calculated for each image entered, that is, on a page-by-page basis, the form recognition system disclosed in Japanese Patent Laid-Open No. 2000-285187 is unsuitable for the recognition of forms, each containing multiple pages.
For the recognition of forms with multiple pages, Japanese Patent Laid-Open No. 10-269311 discloses a method in which a partition form is placed at the top of each block of pages in order to process multiple pages as a single unit. That is, it is determined, every time a form is read, whether or not the form is a partition form, and, if it is, the subsequent form through to the form immediately before the next partition form are processed as a single unit. However, the form recognition disclosed in Japanese Patent Laid-Open No. 10-269311 is disadvantageous in that it involves a cumbersome process of inserting partition forms in dealing with forms with multiple pages.